30 data analyst interview questions

Getting yourself ready for a data analyst interview? We’ve got everything covered. The following are 30 questions you may be asked and these clear answers will give you an edge: Covering many skills and practical challenges, these questions and answers can increase your confidence. This guide will benefit you no matter your time in the industry. We’re ready to begin by looking at the questions.

SQL & Database Management

1. What are JOINs in SQL? Explain different types.
JOINS let you combine data from two or more tables based on a related column.

INNER JOIN: Only returns matching rows.
LEFT JOIN: All records from the left table + matched records from the right.
RIGHT JOIN: All from the right + matches from the left.
FULL OUTER JOIN: All records from both, matching or not.
CROSS JOIN: Returns all possible combinations (cartesian product).

2. How do you handle duplicate records in SQL?
Use DISTINCT to remove duplicates or ROW_NUMBER() in a CTE to filter them. Example:

WITH CTE AS (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY column1 ORDER BY id) AS rn
  FROM your_table
)
DELETE FROM CTE WHERE rn > 1;

3. What is the difference between WHERE and HAVING clauses?

WHERE: Filters rows before grouping.
HAVING: Filters after grouping using GROUP BY.

4. How do you rank rows in SQL using window functions?
Use RANK(), DENSE_RANK(), or ROW_NUMBER():

SELECT name, salary, RANK() OVER (ORDER BY salary DESC) AS rank FROM employees;

5. Explain normalization and denormalization in databases.

Normalization: Breaks data into multiple tables to reduce redundancy.
Denormalization: Combines tables for faster read performance, at the cost of some redundancy.

6. How do you identify and handle missing values in a database?
Use IS NULL to find missing data and COALESCE() or CASE to handle it:

SELECT COALESCE(column, 'Default') FROM table;

7. Write an SQL query to find the second highest salary from an employee table.

SELECT MAX(salary) FROM employees 
WHERE salary < (SELECT MAX(salary) FROM employees);

8. What is an index, and how does it improve query performance?
An index is like a lookup table — it speeds up data retrieval, especially with WHERE, JOIN, or ORDER BY. It trades off extra storage and slightly slower writes.

9. Explain CTE (Common Table Expressions) with an example.
A CTE makes queries more readable and reusable:

WITH TopSalaries AS (
  SELECT name, salary FROM employees WHERE salary > 50000
)
SELECT * FROM TopSalaries WHERE name LIKE 'A%';

10. What is the difference between INNER JOIN and OUTER JOIN?

INNER JOIN: Only matched rows.
OUTER JOIN: Includes unmatched rows too (LEFT, RIGHT, or FULL).

Python for Data Analysis

11. What are the key libraries used in Python for data analysis?

pandas for data manipulation
NumPy for numerical operations
Matplotlib and Seaborn for visualization
scikit-learn for machine learning
statsmodels for statistics

12. How do you handle missing data in pandas?

Use .isnull(), .notnull() to detect
Use .fillna() to replace
Use .dropna() to remove

13. Explain the difference between apply(), map(), and lambda functions in pandas.

map(): Element-wise for Series
apply(): Works on rows/columns for DataFrames
lambda: Anonymous function often used with apply() or map()

14. How do you merge and join datasets in pandas?
Use merge(), join() or concat():

    df1.merge(df2, on='id', how='inner')

15. Write a Python function to find the mean and median of a dataset.

def mean_median(data):
    import numpy as np
    return np.mean(data), np.median(data)

16. Explain the difference between list comprehension and a for loop.

List comprehension is shorter and more Pythonic:

squares = [x**2 for x in range(10)]

For loop is more flexible for complex logic.

17. What are NumPy arrays, and how do they differ from Python lists?
NumPy arrays are faster and support vectorized operations. Unlike lists, all elements must be the same type.

18. How do you visualize data using Matplotlib and Seaborn?
Use plt.plot(), plt.hist() from Matplotlib, and sns.barplot(), sns.heatmap() from Seaborn:

import matplotlib.pyplot as plt
import seaborn as sns
sns.boxplot(data=df, x='Category', y='Value')

Statistics & Probability

19. What is the difference between descriptive and inferential statistics?

Descriptive: Summarizes data (mean, median, std. dev.)
Inferential: Makes predictions based on sample data (confidence intervals, hypothesis testing)

20. Explain the concept of p-value in hypothesis testing.
The p-value tells how likely you’d see the observed data if the null hypothesis were true. A low p-value (< 0.05) means reject the null.

21. What is correlation vs. causation?

Correlation: Variables move together.
Causation: One variable causes the other.
Correlation doesn't imply causation.

22. How do you check if a dataset follows a normal distribution?

Plot a histogram or Q-Q plot
Use tests like Shapiro-Wilk or Kolmogorov-Smirnov

23. Explain the central limit theorem and its significance.
The CLT says that the sampling distribution of the mean will be normal if the sample size is large, even if the population isn’t. It’s why we can use many statistical methods.

24. What is the difference between Type I and Type II errors?

Type I (False Positive): Rejecting a true null hypothesis
Type II (False Negative): Failing to reject a false null

Data Visualization & BI Tools

25. What are the best practices for creating effective dashboards?

Keep it simple and clean
Use the right charts
Highlight KPIs
Use filters for interaction
Tell a story with data

26. How do you decide between a bar chart and a line chart?

Use bar charts for comparing categories
Use line charts to show trends over time

27. What are the advantages of Power BI over Excel?

Better handling of large data
Interactive dashboards
Easier data refresh & modeling
Built-in DAX for advanced calculations

28. Explain the difference between measures and calculated columns in Power BI.

Calculated column: Adds a new column to the table
Measure: A dynamic calculation used in visuals (e.g., sum, average)

29. What are DAX functions, and how are they used in Power BI?
DAX (Data Analysis Expressions) is used for writing formulas. Example:

Total Sales = SUM(Sales[Amount])

30. How do you handle large datasets efficiently in visualization tools?

Use data aggregations
Load only necessary columns
Optimize data types
Use filters and page-level visuals
Avoid high-cardinality columns in visuals

Code Crush

Menu

30 data analyst interview questions

SQL & Database Management

Python for Data Analysis

Statistics & Probability

Data Visualization & BI Tools

0 Comments

Code Crush

Blog Archive

Pages

Contact form

Code Crush

Menu

30 data analyst interview questions

SQL & Database Management

Python for Data Analysis

Statistics & Probability

Data Visualization & BI Tools

You may like these posts

0 Comments

Code Crush

Blog Archive

Pages

Contact form